Discrete

Bernoulli distribution

pmf

$f_{X}(x)=P(X=x)=\left\{\begin{array}{cl} (1-p)^{1-x} p^{x} & \text { for } \mathrm{x}=0 \text { or } 1 \\ 0 & \text { otherwise } \end{array}\right.$

expectation
- $E(X) = p$
variance
- $var(X) = (1-p)p$

Binomial distribution

pmf

$f_{X}(k)=P(X=k)=\left\{\begin{aligned} C_{n}^{k} p^{k}(1-p)^{n-k} & \text { for } \mathrm{k}=0,1, \ldots, \mathrm{n} \\ 0 & \text { otherwise } \end{aligned}\right.$

expectation
- $E(X) = np$
variance
- $var(X) = np(1-p)$

Geometric distribution

pmf

$f_{X}(k)=P(X=k)=\left\{\begin{aligned} p(1-p)^{k-1} & \text { for } \mathrm{k}=1,2,3 \ldots \\ 0 & \text { otherwise } \end{aligned}\right.$

expectation
- $E(X) = \frac{1}{P}$
variance
- $var(X) = \frac{1-P}{P^2}$
memoryless
- $P(X>m+n|X>m) = P(X>n)$

Negative binomial distribution(Pascal)

The negative binomial distribution arises as a generalization of the geometric distribution.
Suppose that a sequence of independent trials each with probability of success $p$ $p$ is performed until there are $r$ $r$ successes in all.
- so can be denote as $p \cdot C_{k-1}^{r-1} p^{r-1}(1-p)^{(k-1)-(r-1)}$ $p \cdot C_{k - 1}^{r - 1} p^{r - 1} (1 - p)^{(k - 1) - (r - 1)}$
  - $X\sim NB(r,p)$

pmf

$f_{X}(k)=P(X=k)=\left\{\begin{aligned} C_{k-1}^{r-1} p^{r}(1-p)^{k-r} & \text { for } \mathrm{k}=\mathrm{r}, \mathrm{r}+1, \mathrm{r}+2 \ldots \\ 0 & \text { otherwise } \end{aligned}\right.$

expectation
- $E(X) = \frac{r}{p}$
variance
- $var(X) = \frac{r(1-p)}{p^2}$
the conduct method can be seen there.

Hypergeometric distribution

Suppose that an urn contains $n$ balls, of which $r$ are black and $n-r$ are white. Let $X$ denote the number of black balls drawn when taking $m$ balls without replacement.
denoted as $X\sim h(m,n,r)$
pmf

$f_{X}(k)=P(X=k)=\left\{\begin{array}{cl} C_{k-1}^{r-1} p^{r}(1-p)^{k-r} & \text { for } \mathrm{k}=\mathrm{r}, \mathrm{r}+1, \mathrm{r}+2 \ldots \\ 0 & \text { otherwise } \end{array}\right.$

expectation
- $E(X) = m\frac{r}{n}$
variance
- $var(X) = \frac{mr(n-m)(n-r)}{n^2(n-1)}$

Poisson distribution

can be derived as the limit of a binomial distribution as the number of trials approaches infinity and the probability of success on each trial approaches zero in such a way that $np = \lambda$ , $\lambda$ can be seen as the successful trials
pmf
- $P(X = k) = \frac{\lambda^k }{k!} e^{-\lambda} \quad k = 0,1,2...$
expectation
- $E(X) = \lambda$
variance
- $var(X) = \lambda$
Property
- Let $X$ and $Y$ are independent Poisson r.v.s with parameters $\theta_1$ and $\theta_2$ , and $X+Y \sim Possion(\theta_1 + \theta_2)$

Continuous

Uniform distribution

A uniform r.v on the interval [a,b] is a model for what we mean when we say "choose a number at random between a and b"
pdf

$f_{X}(x)=\left\{\begin{aligned} \frac{1}{b-a} & a \leq x \leq b \\ 0 & \text { otherwise } \end{aligned}\right.$

cdf(easy to get)

$F_{X}(x)=\left\{\begin{array}{rl} 0 & x \leq a \\ \frac{x-a}{b-a} & a \leq x \leq b \\ 1 & b \leq x \end{array}\right.$

expectation
- $E(X) = \frac{a+b}{2}$
variance
- $var(X) = \frac{(b-a)^2}{12}$

Exponential distribution

Exponential distribution is often used to model lifetimes or waiting times, in which context it is conventional to replace $x$ by $t$ .
pdf

$f_{X}(x)=\left\{\begin{array}{rl} \lambda e^{-\lambda x} & x \geq 0 \\ 0 & \text { otherwise } \end{array}\right.$

cdf(easy to get)

$F_{X}(x)=\left\{\begin{array}{rl} 1-e^{-\lambda x} & x \geq 0 \\ 0 & \text { otherwise } \end{array}\right.$

expectation
- $E(X) = \frac{1}{\lambda}$
variance
- $var(X) = \frac{1}{\lambda^2}$

property

let $X,Y$ are independent Poisson r.v.s with $\theta_1,\theta_2$ ,then $X+Y\sim Poisson (\theta_1+\theta_2)$
Memoryless
- $P(X > s+t | X> s) = P(X>t)$

Gamma distribution

pdf

$g(t)=\left\{\begin{array}{rl} \frac{\lambda^{\alpha}}{\tau(\alpha)} t^{\alpha-1} e^{-\lambda t} & t \geq 0 \\ 0 & \text { otherwise } \end{array}\right.$

$\tau(x) = \int _0^\infty u^{x-1}e^{-u}du,x>0$
expectation
- $E(X) = \frac{\alpha}{\lambda}$
variance
- $Var(X)= \frac{\alpha}{\lambda ^2}$

Property

$Ga(1,\lambda) = \exp (\lambda)$
$Ga(\frac{n}{2},\frac{1}{2}) = \chi ^2 (n)$ $G a (2 n, 2 1) = χ^{2} (n)$
- $E(X) = n$
- $Var(X) = 2n$
$X\sim Ga(\alpha,\lambda) \to kX\sim Ga(\alpha,\frac{\lambda}{k}),k>0$
if $X\sim Ga(\alpha,\lambda),Y\sim Ga(\beta,\lambda),i.i.d$ ,then $X+Y \sim Ga(\alpha+\beta ,\lambda)$
conduct
- $\because \tau(\alpha ) =\int_{0}^{\infty} x^{\alpha-1}e^{-x}dx$
- $\therefore x = \lambda t,\to \tau (\alpha) = \lambda^\alpha \int _{0}^{\infty} t^{\alpha-1}e^{-\lambda t}dt$
- $\therefore \frac{1}{\tau (\alpha)}\lambda^\alpha \int _{0}^{\infty} t^{\alpha-1}e^{-\lambda t}dt = 1$
- $\therefore g(t) =\frac{\lambda^\alpha}{\tau(\alpha)}t^{\alpha-1}e^{-\lambda t}$
$\alpha$ $α$ is called a shape parameter for the gamma density,
- Varying $\alpha$ changes the shape of the density
$\lambda$ $λ$ is called a scale parameter
- Varying $\lambda$ corresponds to changing the units of measurement and does not affect the shape of the density
how to understand gamma?

Normal distribution

$g(t)=\left\{\begin{aligned} \frac{1}{\sigma \sqrt{2 \pi}} e^{-(x-\mu)^{2} /\left(2 \sigma^{2}\right)} & t \geq 0 \\ 0 & \text { otherwise } \end{aligned}\right.$

$\mu$ is the mean
$\sigma$ is the standard deviation
If $X \sim N(\mu; \sigma^2)$ $X \sim N (μ; σ^{2})$ ,and $Y = aX + b$ $Y = a X + b$ , then $Y \sim N(a\mu+b,a^2\sigma^2)$ $Y \sim N (a μ + b, a^{2} σ^{2})$
- especially, if $X \sim N(\mu,\sigma^2)$ , then $Z = \frac{x-\mu}{\sigma}\sim N(0,1)$
$aX+bY \sim N(a\mu_X+b\mu_Y,a^2\sigma_X^2 + b^2\sigma_Y^2 + 2ab\rho \sigma_X\sigma_Y)$

property

if $X,Y \sim N(0,1)$ $X, Y \sim N (0, 1)$ ,then $U = \frac{X}{Y}$ $U = Y X$ is Cauchy r.v (lec3)
- $f_U(u) = \frac{1}{\pi (u^2+1)}$
if $X_1,..,X_n\sim N(0,1)$ $X_{1}, . ., X_{n} \sim N (0, 1)$ ,i.i.d,, then
- $X_1^2 + ... X_n^2 \sim \chi^2(n)$

Logistic distribution

consider the special logistic distribution(0,1):
- $F_X(x) = \frac{1}{1+e^{-x}}$

Exponential family

A family of pdfs or pmfs is called an exponential family if it can be expressed as:
- $p(x,\theta) = H(x)\exp(\theta^T \phi(x) - A(\theta))$
- $H(x)$ is the normalization factor
It is very helpful to model heterogeneous data in the era of big data.
Bernoulli, Gaussian, Binomial, Poisson, Exponential, Weibull, Laplace, Gamma, Beta, Multinomial, Wishart distributions are all exponential families
for Bernoulli:
- $X\sim p^x(1-p)^{1-x}, for x\in \{0,1\}$
- $P^x(1-P)^{1-x} = \exp\{x\ln p + (1-x)\ln (1-p)\} = \exp\{\ln \frac{p}{1-p} x + \ln (1-p)\}$
- $\theta =\ln \frac{p}{1-p}, \phi(x) = x,A(\theta ) = \ln\frac{1}{1-p},H(x) = 1$
the explain can be seen here

Sample

$Var(\bar{X} ) = \frac{\sigma^2}{n}$
$(n-1)S^2 = \sum X^2 - n\bar{X}^2$
$\bar{X}$ 和 $S^2$ 相互独立
$\bar{X} \sim N(\mu,\frac{\sigma^2}{n})$
$\frac{(n-1)S^2}{\sigma^2}\sim \chi^2(n-1)$

Property

$E(X) = E(E(X|Y))$ $E (X) = E (E (X ∣ Y))$
- 可以理解为先分组求期望，与直接求期望一样
$Var(X) = E(Var(X|Y)) + Var(E(X|Y))$ $V a r (X) = E (V a r (X ∣ Y)) + V a r (E (X ∣ Y))$
- 可以理解为组内方差的期望 + 组间方差
if r.v.s X and Y are independent, $E(X|Y) = E(X)$

Inequality

Markov's inequality

$P(X\ge a) \le \frac{E(X)}{a}$

Chebyshev's inequality

$P(|X-E(X)| \ge a) \le \frac{Var(X)}{a^2}$

Chernoff bounds

The generic Chernoff bound requires only the moment generating function of $X$ , defined as $M_X(t) = E(e^{tX})$ , provided it exists.

$P(X\ge a) \le \frac{E(e^{tx})}{e^{t\cdot a}}$

other inequalities can be seen here.

Common distribution

Discrete

Bernoulli distribution

Binomial distribution

Geometric distribution

Negative binomial distribution(Pascal)

Hypergeometric distribution

Poisson distribution

Continuous

Uniform distribution

Exponential distribution

property

Gamma distribution

Property

Normal distribution

property

Logistic distribution

Exponential family

Sample

Property

Inequality

Markov's inequality

Chebyshev's inequality

Chernoff bounds

results matching ""

No results matching ""